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NoSQL Systems: Overview 

NoSQL Systems 

■ Not every data management/analysis problem 

is best solved exclusively using a traditional DBMS 

■ "NoSQL " = "Not Only(SQr) L^^^ 
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NoSQL Systems 

Alternative to traditional relational DBMS 

+ Flexible schema »/ 

+ Quicker/cheaper to setup w 

+ Massive scalability ^ 



+ Relaxed consistency — > higher performanc e & availability 

- No declarative query lang uage — > more programming 

- Relaxedconsistency — > fewer guarantees 
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NoSQL Systems 

Several incarnations 

■ MapReduce framework *\x OL-AP 

■ Key-value stores ^ OLTP 

■ D ocum ent stores^\^ 

■ Graph database systems 
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MapReduce Framework 

Originally from Google, open source Hadoop 

■ No data model, data stored in files <C <- <~ 

■ User provides specific functions 

rfcaAerO w^^^^ corv*V>M<ftrC) 

■ System provides data processing "glue", fault-tolerance, 
scalability 
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Map and Reduce Functions 

Map: Divide problem into subproblem s ^ 

Wc^(jWr) — ^ 0 or wore <Ke^j / i/Jac > yculrs 

Reduce: Do work on subproblems, combine results 
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MapReduce Example: Web log analysis 

Each record: Userl D, URL, timestamp, addit iona l-info 
Task: Count number of accesses for each domain (inside URL) 
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MapReduce Example (modified #1) 
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Each record: UserlD, URL, timestamp, additional-info 

Task: Total "value" of accesses for each domain based on 
additional-info • 




Score- 
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MapReduce Example (modified #2) 

Each record: UserlD, URL, timestamp, ad dition al-info N 

) txj 

Separate records: UserlD, name, age, gender, ... ' 

Task: Total "value" of accesses for each domain based on 
user attributes 
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MapReduce Framework 

■ No data model, data stored in files v/ 

■ User provides specific functions ^ 

■ System provides data processing "glue", fault-tolerance, 
scalability / 
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MapReduce Framework 

S cheme s and d eclarative queri es are missed 

( jHivg ^ schemas, SQL -like query language 

(Pig^ more imperative but with relational operat ors 

■ Both compile to " workflow " of Hadoop (MapReduce) job s 

T 

Dryad allows user to specify workflo w J 

■ Also DryadLINQ language J* 
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Key- Value Stores 1 DLT? " 
Extremely simple interface ^ 

■ Data model: (key, value) pairs ^ 

■ Operations: lnsert(key, value), Fetch(key), 

^-^llpdate(key), Delete(key)<£ — 

Implementation: ef ficien cy, sc alabi lity faul t-toler ance 

■ Records distributed to nodes based on key «^ — ■ 

■ Replication ^ — 

■ Single-record transactions, "eventual consistenc y" 
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Key-Value Stores 

Extremely simple interface^ 

■ Data model: (key(vaTug])pairs 

■ Operations: lnsert(key,value), Fetch(key), 

U pdate(ke y), Delete(key) 

■ Some allow (o ^-uniform ) column s within value 

■ Some allow Fetch on range of keys 

Example systems Si * < 10 

■ Google BigTable, Amazon Dynamo, Cassandra, 
Voldemort, HBase, ... 
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Document Stores 

Like Key-Value Stores except value is document 

■ Data model: (key, documen t) pairs < 

■ Document: JSON, XML other semistructured formats 

— *S ^ 

■ Basic operations: Insert(keydocument), Fetch(key)^ 

— > Update(key), Delete(key) /jfer^fr: 

■ Also Fetch based on document contents 9^ ^ 

Example systems 

■ CouchDB, MongoDB, SimpleDB, ... 
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Graph Database Systems 

■ Data model: node s and edges 

■ Nodes may have p ropertie s (including ]D) 

■ Edges may have l abels or roles 
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Graph Database Systems 

■ Interfaces and query languages vary 

■ Single-step versus "path expressions" versus full recursion 

■ Example systems 

Neo4j, FlockDB, Pregel, ... 

■ RDF "triple stores" can map to graph databases 



Jennifer Widom 



NoSQL Systems: Overview 



NoSQL Systems 

■ "NoSQL" = "Not Only SQL " 

Not every data management/analysis problem 
is best solved exclusively using a traditional DBMS 

■ Current incarnations 

f - MapReduce framework \S 

) - Key-value stores \S 

I - Document stores ^ 

V - Graph database systems ^ 
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